A major goal of computer vision is to enable computers to interpret visualsituations---abstract concepts (e.g., "a person walking a dog," "a crowdwaiting for a bus," "a picnic") whose image instantiations are linked more bytheir common spatial and semantic structure than by low-level visualsimilarity. In this paper, we propose a novel method for prior learning andactive object localization for this kind of knowledge-driven search in staticimages. In our system, prior situation knowledge is captured by a set offlexible, kernel-based density estimations---a situation model---that representthe expected spatial structure of the given situation. These estimations areefficiently updated by information gained as the system searches for relevantobjects, allowing the system to use context as it is discovered to narrow thesearch. More specifically, at any given time in a run on a test image, our systemuses image features plus contextual information it has discovered to identify asmall subset of training images---an importance cluster---that is deemed mostsimilar to the given test image, given the context. This subset is used togenerate an updated situation model in an on-line fashion, using an efficientmultipole expansion technique. As a proof of concept, we apply our algorithm to a highly varied andchallenging dataset consisting of instances of a "dog-walking" situation. Ourresults support the hypothesis that dynamically-rendered, context-basedprobability models can support efficient object localization in visualsituations. Moreover, our approach is general enough to be applied to diversemachine learning paradigms requiring interpretable, probabilisticrepresentations generated from partially observed data.
展开▼